# Lecture 04 Machine-Level Programming – part 1

Euhyun Moon, Ph.D.

Machine Learning Systems (MLSys) Lab
Computer Science and Engineering
Sogang University



Slides adapted from Randy Bryant and Dave O'Hallaron: Introduction to Computer Systems, CMU

### **Definitions**

- Architecture (ISA): The parts of a processor design that one needs to understand to write assembly code
  - "What is directly visible to software"
  - The "contract" or "blueprint" between hardware and software
- Microarchitecture: Implementation of the architecture

### Instruction Set Architectures

- The ISA defines:
  - The system's state (e.g., registers, memory, program counter)
  - The instructions the CPU can execute
  - The effect that each of these instructions will have on the system state



### **General ISA Design Decisions**

- Instructions
  - What instructions are available? What do they do?
  - How are they encoded?
- Registers
  - How many registers are there?
  - How wide are they?
- Memory
  - How do you specify a memory location?

### **Instruction Set Philosophies**

- Complex Instruction Set Computing (CISC):
   Add more and more elaborate and specialized instructions as needed
  - Lots of tools for programmers to use, but hardware must be able to handle all instructions
  - x86-64 is CISC, but only a small subset of instructions encountered with Linux programs
- Reduced Instruction Set Computing (RISC): Keep instruction set small and regular
  - Easier to build fast hardware
  - Let software do the complicated operations by composing simpler ones

#### **Mainstream ISAs**



x86

Designer Intel, AMD

**Bits** 16-bit, 32-bit and 64-bit

Introduced 1978 (16-bit), 1985 (32-bit), 2003

(64-bit)

Design CISC

**Type** Register-memory

**Encoding** Variable (1 to 15 bytes)

**Endianness** Little

Macbooks & PCs (Core i3, i5, i7, M) x86-64 Instruction Set



#### **ARM** architectures

**Designer** ARM Holdings

**Bits** 32-bit, 64-bit

Introduced 1985; 31 years ago

**Design** RISC

Type Register-Register

**Encoding** AArch64/A64 and AArch32/A32

use 32-bit instructions, T32 (Thumb-2) uses mixed 16- and 32-bit instructions. ARMv7 user-space compatibility<sup>[1]</sup>

Endianness Bi (little as default)

Smartphone-like devices (iPhone, iPad, Raspberry Pi) ARM Instruction Set



#### **RISC-V**

**Designer** University of California,

Berkeley

**Bits** 32 · 64 · 128

Introduced 2010

Design RISC

Type Load-store

**Encoding** Variable Endianness Little<sup>[1][3]</sup>

Mostly research (some traction in embedded)

RISC-V Instruction Set

#### **Architecture Sits at the Hardware Interface**



### Writing Assembly Code? In 2024?

- Chances are, you'll never write a program in assembly, but understanding assembly is the key to the machine-level execution model:
  - Behavior of programs in the presence of bugs
    - When high-level language model breaks down
  - Tuning program performance
    - Understand optimizations done/not done by the compiler
    - Understanding sources of program inefficiency
  - Implementing systems software
    - What are the "states" of processes that the OS must manage
    - Using special units (timers, I/O co-processors, etc.) inside processor!
  - Fighting malicious software
    - Distributed software is in binary form

# **Assembly Programmer's View**



- Address of next instruction
- Named registers
  - Together in "register file"
  - Heavily used program data
- Condition codes
  - Store status information about most recent arithmetic operation
  - Used for conditional branching

- Memory
  - Byte-addressable array
  - Code and user data
  - Includes the Stack (for supporting procedures)

## x86-64 Assembly "Data Types"

- Integral data of 1, 2, 4, or 8 bytes
  - Data values
  - Addresses
- Floating point data of 4, 8, 10 or 2x8 or 4x4 or 8x2
  - Different registers for those (e.g., %xmm1, %ymm2)
  - Come from extensions to x86 (SSE, AVX, ...)

Not covered in 3030

- No aggregate types such as arrays or structures
  - Just contiguously allocated bytes in memory
- Two common syntaxes
  - "AT&T": used by our course, slides, textbook, gnu tools, ...
  - "Intel": used by Intel documentation, Intel tools, ...
  - Must know which you're reading

### What is a Register?

- A location in the CPU that stores a small amount of data, which can be accessed very quickly (once every clock cycle)
- Registers have names, not addresses
  - In assembly, they start with % (e.g., %rsi)
- Registers are at the heart of assembly programming
  - They are a precious commodity in all architectures, but especially x86

### x86-64 Integer Registers – 64 bits wide

| %rax | %eax | % <b>r8</b>  | %r8d  |
|------|------|--------------|-------|
| %rbx | %ebx | % <b>r9</b>  | %r9d  |
| %rcx | %ecx | %r10         | %r10d |
| %rdx | %edx | %r11         | %r11d |
| %rsi | %esi | %r12         | %r12d |
| %rdi | %edi | %r13         | %r13d |
| %rsp | %esp | % <b>r14</b> | %r14d |
| %rbp | %ebp | %r15         | %r15d |

Can reference low-order 4 bytes (also low-order 2 & 1 bytes)

#### Some History: IA32 Registers – 32 bits wide



### Memory vs. Register

#### Memory

#### Register

- Addresses
  - 0x7FFFD024C3DC
- Big
  - ~ 8 GiB
- Slow
  - ~50-100 ns
- Dynamic
  - Can "grow" as needed while program runs

vs. Names

%rdi

vs. Small

 $(16 \times 8 B) = 128 B$ 

vs. Fast

sub-nanosecond timescale

**vs.** Static

fixed number in hardware

### **Three Basic Kinds of Instructions**

- 1) Transfer data between memory and register
  - Load data from memory into register
    - %reg = Mem[address]
  - Store register data into memory
    - Mem[address] = %reg

Remember: Memory is indexed just like an array of bytes!

2) Perform arithmetic operation on register or memory data

```
• c = a + b; z = x << y; i = h & g;
```

- 3) Control flow: what instruction to execute next
  - Unconditional jumps to/from procedures
  - Conditional branches

#### **Instruction Sizes**

Size specifiers

```
• \mathbf{b} = 1-byte "byte" \mathbf{w} = 2

\mathbf{1} = 4-byte "long word" \mathbf{q} = 8
```

```
\mathbf{w} = 2-byte "word"
\mathbf{q} = 8-byte "quad word"
```

 Note that due to backwards-compatible support for 8086 programs (16-bit machines!), "word" means 16 bits = 2 bytes in x86 instruction names

# **Operand Types**

- *Immediate:* Constant integer data (\$)
  - Examples: \$0x400, \$-533
  - Like C literal, but prefixed with `\$'
  - Encoded with 1, 2, 4, or 8 bytes depending on the instruction
- Register: 1 of 16 integer registers (%)
  - Examples: %rax, %r13
  - But %rsp reserved for special use
  - Others have special uses for particular instructions
- Memory: Consecutive bytes of memory at a computed address (())
  - Simplest example: (%rax)
  - Various other "address modes"

| %rax |
|------|
| %rcx |
| %rdx |
| %rbx |
| %rsi |
| %rdi |
| %rsp |
| %rbp |
|      |
| %rN  |

#### x86-64 Introduction

- Data transfer instruction (mov)
- Arithmetic operations
- Memory addressing modes
  - swap example

### **Moving Data**

- General form: mov source, destination
  - Missing letter ( ) specifies size of operands
  - Lots of these in typical code
- movb src, dst
  - Move 1-byte "byte"
- movw src, dst
  - Move 2-byte "word"

- movl src, dst
  - Move 4-byte "long word"
- movq src, dst
  - Move 8-byte "quad word"

### **Operand Combinations**



- Cannot do memory-memory transfer with a single instruction
  - How would you do it?

## **Some Arithmetic Operations**

- Binary (two-operand) Instructions:
  - Maximum of one memory operand
  - Beware argument order!
  - No distinction between signed and unsigned
    - Only arithmetic vs. logical shifts

| Format |                        | Computation |                  |                    |  |
|--------|------------------------|-------------|------------------|--------------------|--|
| addq   | src,                   | dst         | dst = dst + src  | (dst += src)       |  |
| subq   | src,                   | dst         | dst = dst - src  |                    |  |
| imulq  | src,                   | dst         | dst = dst * src  | signed mult        |  |
| sarq   | src,                   | dst         | dst = dst >> src | <b>A</b> rithmetic |  |
| shrq   | src,                   | dst         | dst = dst >> src | Logical            |  |
| shlq   | src,                   | dst         | dst = dst << src | (same as salq)     |  |
| xorq   | src,                   | dst         | dst = dst ^ src  |                    |  |
| andq   | src,                   | dst         | dst = dst & src  |                    |  |
|        | src,                   |             | •                |                    |  |
| 1      | operand size specifier |             |                  |                    |  |

How do you implement "rcx = rax + rbx"?

### **Some Arithmetic Operations**

Unary (one-operand) Instructions:

| Format   | Computation   |                    |
|----------|---------------|--------------------|
| incq dst | dst = dst + 1 | increment          |
| decq dst | dst = dst - 1 | decrement          |
| negq dst | dst = -dst    | negate             |
| notq dst | dst = ∼dst    | bitwise complement |

• See CS:APP3e textbook Section 3.5.5 for more instructions: mulq, cqto, idivq, divq

### **Arithmetic Example**

```
long simple_arith(long x, long y)
{
  long t1 = x + y;
  long t2 = t1 * 3;
  return t2;
}
```

```
RegisterUse(s)%rdi1st argument (x)%rsi2nd argument (y)%raxreturn value
```

```
y += x;
y *= 3;
long r = y;
return r;
```

```
simple_arith:
  addq %rdi, %rsi
  imulq $3, %rsi
  movq %rsi, %rax
  ret
```

### **Example of Basic Addressing Modes**

```
void swap(long *xp, long *yp)
{
  long t0 = *xp;
  long t1 = *yp;
  *xp = t1;
  *yp = t0;
}
```

```
swap:
    movq (%rdi), %rax
    movq (%rsi), %rdx
    movq %rdx, (%rdi)
    movq %rax, (%rsi)
    ret
```

#### Compiler Explorer:

https://godbolt.org/z/zc4Pcq

## **Summary**

- x86-64 is a complex instruction set computing (CISC) architecture
  - There are 3 types of operands in x86-64
    - Immediate, Register, Memory
  - There are 3 types of instructions in x86-64
    - Data transfer, Arithmetic, Control Flow